85 research outputs found

    Analysis of nanopore detector measurements using Machine-Learning methods, with application to single-molecule kinetic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A nanopore detector has a nanometer-scale trans-membrane channel across which a potential difference is established, resulting in an ionic current through the channel in the pA-nA range. A distinctive channel current blockade signal is created as individually "captured" DNA molecules interact with the channel and modulate the channel's ionic current. The nanopore detector is sensitive enough that nearly identical DNA molecules can be classified with very high accuracy using machine learning techniques such as Hidden Markov Models (HMMs) and Support Vector Machines (SVMs).</p> <p>Results</p> <p>A non-standard implementation of an HMM, emission inversion, is used for improved classification. Additional features are considered for the feature vector employed by the SVM for classification as well: The addition of a single feature representing spike density is shown to notably improve classification results. Another, much larger, feature set expansion was studied (2500 additional features instead of 1), deriving from including all the HMM's transition probabilities. The expanded features can introduce redundant, noisy information (as well as diagnostic information) into the current feature set, and thus degrade classification performance. A hybrid Adaptive Boosting approach was used for feature selection to alleviate this problem.</p> <p>Conclusion</p> <p>The methods shown here, for more informed feature extraction, improve both classification and provide biologists and chemists with tools for obtaining a better understanding of the kinetic properties of molecules of interest.</p

    Preliminary nanopore cheminformatics analysis of aptamer-target binding strength

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aptamers are nucleic acids selected for their ability to bind to molecules of interest and may provide the basis for a whole new class of medicines. If the aptamer is simply a dsDNA molecule with a ssDNA overhang (a "sticky" end) then the segment of ssDNA that complements that overhang provides a known binding target with binding strength adjustable according to length of overhang.</p> <p>Results</p> <p>Two bifunctional aptamers are examined using a nanopore detector. They are chosen to provide sensitive, highly modulated, blockade signals with their captured ends, while their un-captured regions are designed to have binding moieties for complementary ssDNA targets. The bifunctional aptamers are duplex DNA on their channel-captured portion, and single-stranded DNA on their portion with binding ability. For short ssDNA, the binding is merely to the complementary strand of DNA, which is what is studied here – for 5-base and 6-base overhangs.</p> <p>Conclusion</p> <p>A preliminary statistical analysis using hidden Markov models (HMMs) indicates a clear change in the blockade pattern upon binding by the single captured aptamer. This is also consistent with the hypothesis that significant conformational changes occur during the annealing binding event. In further work the objective is to simply extend this ssDNA portion to be a well-studied ~80 base ssDNA aptamer, joined to the same bifunctional aptamer molecular platform.</p

    Hybrid MM/SVM structural sensors for stochastic sequential data

    Get PDF
    In this paper we present preliminary results stemming from a novel application of Markov Models and Support Vector Machines to splice site classification of Intron-Exon and Exon-Intron (5' and 3') splice sites. We present the use of Markov based statistical methods, in a log likelihood discriminator framework, to create a non-summed, fixed-length, feature vector for SVM-based classification. We also explore the use of Shannon-entropy based analysis for automated identification of minimal-size models (where smaller models have known information loss according to the specified Shannon entropy representation). We evaluate a variety of kernels and kernel parameters in the classification effort. We present results of the algorithms for splice-site datasets consisting of sequences from a variety of species for comparison

    The NTD Nanoscope: potential applications and implementations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nanopore transduction detection (NTD) offers prospects for a number of highly sensitive and discriminative applications, including: (i) single nucleotide polymorphism (SNP) detection; (ii) targeted DNA re-sequencing; (iii) protein isoform assaying; and (iv) biosensing via antibody or aptamer coupled molecules. Nanopore event transduction involves single-molecule biophysics, engineered information flows, and nanopore cheminformatics. The NTD Nanoscope has seen limited use in the scientific community, however, due to lack of information about potential applications, and lack of availability for the device itself. Meta Logos Inc. is developing both pre-packaged device platforms and component-level (unassembled) kit platforms (the latter described here). In both cases a lipid bi-layer workstation is first established, then augmentations and operational protocols are provided to have a nanopore transduction detector. In this paper we provide an overview of the NTD Nanoscope applications and implementations. The NTD Nanoscope Kit, in particular, is a component-level reproduction of the standard NTD device used in previous research papers.</p> <p>Results</p> <p>The NTD Nanoscope method is shown to functionalize a single nanopore with a channel current modulator that is designed to transduce events, such as binding to a specific target. To expedite set-up in new lab settings, the calibration and troubleshooting for the NTD Nanoscope kit components and signal processing software, the NTD Nanoscope Kit, is designed to include a set of test buffers and control molecules based on experiments described in previous NTD papers (the model systems briefly described in what follows). The description of the Server-interfacing for advanced signal processing support is also briefly mentioned.</p> <p>Conclusions</p> <p>SNP assaying, SNP discovery, DNA sequencing and RNA-seq methods are typically limited by the accuracy of the error rate of the enzymes involved, such as methods involving the polymerase chain reaction (PCR) enzyme. The NTD Nanoscope offers a means to obtain higher accuracy as it is a single-molecule method that does not inherently involve use of enzymes, using a functionalized nanopore instead.</p

    Accumulation of GC donor splice signals in mammals

    Get PDF
    The GT dinucleotide in the first two intron positions is the most conserved element of the U2 donor splice signals. However, in a small fraction of donor sites, GT is replaced by GC. A substantial enrichment of GC in donor sites of alternatively spliced genes has been observed previously in human, nematode and Arabidopsis, suggesting that GC signals are important for regulation of alternative splicing. We used parsimony analysis to reconstruct evolution of donor splice sites and inferred 298 GT > GC conversion events compared to 40 GC > GT conversion events in primate and rodent genomes. Thus, there was substantive accumulation of GC donor splice sites during the evolution of mammals. Accumulation of GC sites might have been driven by selection for alternative splicing

    SVM clustering

    Get PDF

    Duration learning for analysis of nanopore ionic current blockades

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ionic current blockade signal processing, for use in nanopore detection, offers a promising new way to analyze single molecule properties, with potential implications for DNA sequencing. The alpha-Hemolysin transmembrane channel interacts with a translocating molecule in a nontrivial way, frequently evidenced by a complex ionic flow blockade pattern. Typically, recorded current blockade signals have several levels of blockade, with various durations, all obeying a fixed statistical profile for a given molecule. Hidden Markov Model (HMM) based duration learning experiments on artificial two-level Gaussian blockade signals helped us to identify proper modeling framework. We then apply our framework to the real multi-level DNA hairpin blockade signal.</p> <p>Results</p> <p>The identified upper level blockade state is observed with durations that are geometrically distributed (consistent with an a physical decay process for remaining in any given state). We show that mixture of convolution chains of geometrically distributed states is better for presenting multimodal long-tailed duration phenomena. Based on learned HMM profiles we are able to classify 9 base-pair DNA hairpins with accuracy up to 99.5% on signals from same-day experiments.</p> <p>Conclusion</p> <p>We have demonstrated several implementations for <it>de novo </it>estimation of duration distribution probability density function with HMM framework and applied our model topology to the real data. The proposed design could be handy in molecular analysis based on nanopore current blockade signal.</p

    Hidden Markov Model Variants and their Application

    Get PDF
    Markov statistical methods may make it possible to develop an unsupervised learning process that can automatically identify genomic structure in prokaryotes in a comprehensive way. This approach is based on mutual information, probabilistic measures, hidden Markov models, and other purely statistical inputs. This approach also provides a uniquely common ground for comparative prokaryotic genomics. The approach is an on-going effort by its nature, as a multi-pass learning process, where each round is more informed than the last, and thereby allows a shift to the more powerful methods available for supervised learning at each iteration. It is envisaged that this "bootstrap" learning process will also be useful as a knowledge discovery tool. For such an ab initio prokaryotic gene-finder to work, however, it needs a mechanism to identify critical motif structure, such as those around the start of coding or start of transcription (and then, hopefully more). For eukaryotes, even with better start-of-coding identification, parsing of eukaryotic coding regions by the HMM is still limited by the HMM's single gene assumption, as evidenced by the poor performance in alternatively spliced regions. To address these complications an approach is described to expand the states in a eukaryotic gene-predictor HMM, to operate with two layers of DNA parsing. This extension from the single layer gene prediction parse is indicated after preliminary analysis of the C. elegans alt-splice statistics. State profiles have made use of a novel hash-interpolating MM (hIMM) method. A new implementation for an HMM-with-Duration is also described, with far-reaching application to gene-structure identification and analysis of channel current blockade data
    • …
    corecore